Wayne
Generate-then-Verify: Reconstructing Data from Limited Published Statistics
Liu, Terrance, Xiao, Eileen, Smith, Adam, Thaker, Pratiksha, Wu, Zhiwei Steven
We study the problem of reconstructing tabular data from aggregate statistics, in which the attacker aims to identify interesting claims about the sensitive data that can be verified with 100% certainty given the aggregates. Successful attempts in prior work have conducted studies in settings where the set of published statistics is rich enough that entire datasets can be reconstructed with certainty. In our work, we instead focus on the regime where many possible datasets match the published statistics, making it impossible to reconstruct the entire private dataset perfectly (i.e., when approaches in prior work fail). We propose the problem of partial data reconstruction, in which the goal of the adversary is to instead output a $\textit{subset}$ of rows and/or columns that are $\textit{guaranteed to be correct}$. We introduce a novel integer programming approach that first $\textbf{generates}$ a set of claims and then $\textbf{verifies}$ whether each claim holds for all possible datasets consistent with the published aggregates. We evaluate our approach on the housing-level microdata from the U.S. Decennial Census release, demonstrating that privacy violations can still persist even when information published about such data is relatively sparse.
- North America > United States > Michigan > Wayne County > Wayne (0.04)
- North America > United States > Maryland > Baltimore (0.04)
Thesis: Document Summarization with applications to Keyword extraction and Image Retrieval
Automatic summarization is the process of reducing a text document in order to generate a summary that retains the most important points of the original document. In this work, we study two problems - i) summarizing a text document as set of keywords/caption, for image recommedation, ii) generating opinion summary which good mix of relevancy and sentiment with the text document. Intially, we present our work on an recommending images for enhancing a substantial amount of existing plain text news articles. We use probabilistic models and word similarity heuristics to generate captions and extract Key-phrases which are re-ranked using a rank aggregation framework with relevance feedback mechanism. We show that such rank aggregation and relevant feedback which are typically used in Tagging Documents, Text Information Retrieval also helps in improving image retrieval. These queries are fed to the Yahoo Search Engine to obtain relevant images 1. Our proposed method is observed to perform better than all existing baselines. Additonally, We propose a set of submodular functions for opinion summarization. Opinion summarization has built in it the tasks of summarization and sentiment detection. However, it is not easy to detect sentiment and simultaneously extract summary. The two tasks conflict in the sense that the demand of compression may drop sentiment bearing sentences, and the demand of sentiment detection may bring in redundant sentences. However, using submodularity we show how to strike a balance between the two requirements. Our functions generate summaries such that there is good correlation between document sentiment and summary sentiment along with good ROUGE score. We also compare the performances of the proposed submodular functions.
- South America > Argentina (0.04)
- North America > United States > Michigan > Wayne County > Wayne (0.04)
- Asia > India > Maharashtra > Mumbai (0.04)
- (8 more...)
- Transportation > Ground > Road (1.00)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Clustering US Counties to Find Patterns Related to the COVID-19 Pandemic
Brown, Cora, Milstein, Sarah, Sun, Tianyi, Zhao, Cooper
When COVID-19 first started spreading and quarantine was implemented, the Society for Industrial and Applied Mathematics (SIAM) Student Chapter at the University of Minnesota-Twin Cities began a collaboration with Ecolab to use our skills as data scientists and mathematicians to extract useful insights from relevant data relating to the pandemic. This collaboration consisted of multiple groups working on different projects. In this write-up we focus on using clustering techniques to help us find groups of similar counties in the US and use that to help us understand the pandemic. Our team for this project consisted of University of Minnesota students Cora Brown, Sarah Milstein, Tianyi Sun, and Cooper Zhao, with help from Ecolab Data Scientist Jimmy Broomfield and University of Minnesota student Skye Ke. In the sections below we describe all of the work done for this project. In Section 2, we list the data we gathered, as well as the feature engineering we performed. In Section 3, we describe the metrics we used for evaluating our models. In Section 4, we explain the methods we used for interpreting the results of our various clustering approaches. In Section 5, we describe the different clustering methods we implemented. In Section 6, we present the results of our clustering techniques and provide relevant interpretation. Finally, in Section 7, we provide some concluding remarks comparing the different clustering methods.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Michigan > Wayne County > Wayne (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- (26 more...)
- Health & Medicine > Epidemiology (0.86)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.63)
- Health & Medicine > Therapeutic Area > Immunology (0.63)
Graph Attention Networks Unveil Determinants of Intra- and Inter-city Health Disparity
Liu, Chenyue, Fan, Chao, Mostafavi, Ali
Understanding the determinants underlying variations in urban health status is important for informing urban design and planning, as well as public health policies. Multiple heterogeneous urban features could modulate the prevalence of diseases across different neighborhoods in cities and across different cities. This study examines heterogeneous features related to socio-demographics, population activity, mobility, and the built environment and their non-linear interactions to examine intra- and inter-city disparity in prevalence of four disease types: obesity, diabetes, cancer, and heart disease. Features related to population activity, mobility, and facility density are obtained from large-scale anonymized mobility data. These features are used in training and testing graph attention network (GAT) models to capture non-linear feature interactions as well as spatial interdependence among neighborhoods. We tested the models in five U.S. cities across the four disease types. The results show that the GAT model can predict the health status of people in neighborhoods based on the top five determinant features. The findings unveil that population activity and built-environment features along with socio-demographic features differentiate the health status of neighborhoods to such a great extent that a GAT model could predict the health status using these features with high accuracy. The results also show that the model trained on one city can predict health status in another city with high accuracy, allowing us to quantify the inter-city similarity and discrepancy in health status. The model and findings provide novel approaches and insights for urban designers, planners, and public health officials to better understand and improve health disparities in cities by considering the significant determinant features and their interactions.
- North America > United States > Texas > Brazos County > College Station (0.14)
- North America > United States > Arkansas > Cross County (0.05)
- North America > United States > New York > Queens County > New York City (0.04)
- (10 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.87)
Anomaly Detection for High-Dimensional Data Using Large Deviations Principle
Guggilam, Sreelekha, Chandola, Varun, Patra, Abani
Most current anomaly detection methods suffer from the curse of dimensionality when dealing with high-dimensional data. We propose an anomaly detection algorithm that can scale to high-dimensional data using concepts from the theory of large deviations. The proposed Large Deviations Anomaly Detection (LAD) algorithm is shown to outperform state of art anomaly detection methods on a variety of large and high-dimensional benchmark data sets. Exploiting the ability of the algorithm to scale to high-dimensional data, we propose an online anomaly detection method to identify anomalies in a collection of multivariate time series. We demonstrate the applicability of the online algorithm in identifying counties in the United States with anomalous trends in terms of COVID-19 related cases and deaths. Several of the identified anomalous counties correlate with counties with documented poor response to the COVID pandemic.
- North America > United States > New York > Erie County > Buffalo (0.04)
- North America > United States > Michigan > Wayne County > Wayne (0.04)
- North America > United States > Wyoming > Albany County > Laramie (0.04)
- (10 more...)
Lightweight Data Fusion with Conjugate Mappings
Dean, Christopher L., Lee, Stephen J., Pacheco, Jason, Fisher, John W. III
We present an approach to data fusion that combines the interpretability of structured probabilistic graphical models with the flexibility of neural networks. The proposed method, lightweight data fusion (LDF), emphasizes posterior analysis over latent variables using two types of information: primary data, which are well-characterized but with limited availability, and auxiliary data, readily available but lacking a well-characterized statistical relationship to the latent quantity of interest. The lack of a forward model for the auxiliary data precludes the use of standard data fusion approaches, while the inability to acquire latent variable observations severely limits direct application of most supervised learning methods. LDF addresses these issues by utilizing neural networks as conjugate mappings of the auxiliary data: nonlinear transformations into sufficient statistics with respect to the latent variables. This facilitates efficient inference by preserving the conjugacy properties of the primary data and leads to compact representations of the latent variable posterior distributions. We demonstrate the LDF methodology on two challenging inference problems: (1) learning electrification rates in Rwanda from satellite imagery, high-level grid infrastructure, and other sources; and (2) inferring county-level homicide rates in the USA by integrating socio-economic data using a mixture model of multiple conjugate mappings.
- Africa > Rwanda (0.25)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- (20 more...)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Banking & Finance > Economy (0.65)
- Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.34)
AICov: An Integrative Deep Learning Framework for COVID-19 Forecasting with Population Covariates
Fox, Geoffrey C., von Laszewski, Gregor, Wang, Fugang, Pyne, Saumyadipta
The COVID-19 pandemic has profound global consequences on health, economic, social, political, and almost every major aspect of human life. Therefore, it is of great importance to model COVID-19 and other pandemics in terms of the broader social contexts in which they take place. We present the architecture of AICov, which provides an integrative deep learning framework for COVID-19 forecasting with population covariates, some of which may serve as putative risk factors. We have integrated multiple different strategies into AICov, including the ability to use deep learning strategies based on LSTM and even modeling. To demonstrate our approach, we have conducted a pilot that integrates population covariates from multiple sources. Thus, AICov not only includes data on COVID-19 cases and deaths but, more importantly, the population's socioeconomic, health and behavioral risk factors at a local level. The compiled data are fed into AICov, and thus we obtain improved prediction by integration of the data to our model as compared to one that only uses case and death data.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Washington (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (10 more...)
Beyond Imitation: Generative and Variational Choreography via Machine Learning
Pettee, Mariel, Shimmin, Chase, Duhaime, Douglas, Vidrin, Ilya
Our team of dance artists, physicists, and machine learning researchers has collectively developed several original, configurable machine-learning tools to generate novel sequences of choreography as well as tunable variations on input choreographic sequences. We use recurrent neural network and autoencoder architectures from a training dataset of movements captured as 53 three-dimensional points at each timestep. Sample animations of generated sequences and an interactive version of our model can be found at http: //www.beyondimitation.com.
- North America > United States > Michigan > Wayne County > Wayne (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Ford replaces CEO Mark Fields in push to transform business
FILE - In this April 12, 2017 file photo, Ford Motor Co. President and CEO Mark Fields speaks during a media preview of the 2018 Lincoln Navigator at the New York International Auto Show in New York. Ford is replacing its CEO amid questions about its current performance and future strategy, a person familiar with the situation has said. Fields will be replaced by Jim Hackett, who joined Ford's board in 2013. FILE - In this April 12, 2017 file photo, Ford Motor Co. President and CEO Mark Fields speaks during a media preview of the 2018 Lincoln Navigator at the New York International Auto Show in New York. Ford is replacing its CEO amid questions about its current performance and future strategy, a person familiar with the situation has said.
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Michigan > Wayne County > Wayne (0.05)
- North America > United States > Michigan > Wayne County > Dearborn (0.05)
- (6 more...)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks > Manufacturer (1.00)
Trump tweets himself praise as Ford dumps plan for Mexico plant, looks to hire more in Michigan
WASHINGTON – Ford scuttled a plan to build a new factory in Mexico Tuesday following criticism from Donald Trump, and just hours after the president-elect attacked General Motors for importing Mexican-made cars into the US. Following months of criticism from Trump for its investments in Mexico, Ford said it was spiking a plan to build a new $1.6 billion plant in San Luis Potosi, and would instead invest $700 million over the next four years to expand its Flat Rock Assembly Plant in Michigan to build electric and self-driving vehicles. Ford chief executive Mark Fields said the second-biggest U.S. automaker was hopeful Trump's policies will boost the U.S. manufacturing environment. "It's literally a vote of confidence around some of the pro-growth policies that he has been outlining and that's why we're making this decision to invest here in the U.S. and our plant here in Michigan," Fields told CNN. Earlier, GM became the latest multinational to end up in Trump's line of fire -- via Twitter as usual -- with the president-elect threatening to impose a tariff on GM's imports of a small number of Mexican-made Chevy Cruze cars to the U.S. Trump took to Twitter again to crow about the Ford reversal.
- South America > Bolivia > Potosí Department > Tomás Frías Province > Potosí (0.25)
- North America > Mexico > San Luis Potosí (0.25)
- North America > Canada (0.18)
- (6 more...)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Government > Foreign Policy (1.00)
- (3 more...)